Impact of Feature Selection Techniques for Tweet Sentiment Classification
نویسندگان
چکیده
Sentiment analysis of tweets is a powerful application of mining social media sites that can be used for a variety of social sensing tasks. Common feature engineering techniques frequently result in a large numbers of features being generated to represent tweets. Many of these features may degrade classifier performance and increasing computational cost. Feature selection techniques can be used to select an optimal subset of features, reducing the computational cost of training a classifier, and potentially improving classification performance. Despite its benefits, feature selection has received little attention within the tweet sentiment domain. We study the impact of ten filter-based feature selection techniques on classification performance, using ten feature subset sizes and four different learners. Our experimental results demonstrate that feature selection can significantly improve classification performance in comparison to not using feature selection. Additionally, both choice of ranker and feature subset size significantly impact classifier performance. To the best of our knowledge, this is the first work which extensively studies feature selections effect on tweet sentiment classification.
منابع مشابه
Necessity of Feature Selection when Augmenting Tweet Sentiment Feature Spaces with Emoticons
Tweet sentiment classification seeks to identify the emotional polarity of a tweet. One potential way to enhance classification performance is to include emoticons as features. Emoticons are representations of faces expressing various emotions in text. They are created through combinations of letters, punctuation marks and symbols, and are frequently found within tweets. While emoticons have be...
متن کاملMicro-Blog Emotion Classification Method Research Based on Cross-Media Features
Although the sentiment analysis of tweet has caused more and more attention in recent years, most existing methods mainly analyze the text information. Because of the fuzziness of emotion expression, users are more likely to use mixed ways, such as words and image, to express their feelings. This paper proposes a classification method of tweet emotion based on fusion feature, which combines the...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملImpact of Feature Selection on Micro-Text Classification
Social media datasets – especially Twier tweets – are popular in the eld of text classication. Tweets are a valuable source of microtext (sometimes referred to as “micro-blogs”), and have been studied in domains such as sentiment analysis, recommendation systems, spam detection, clustering, among others [6]. Tweets oen include keywords referred to as “Hashtags” that can be used as labels fo...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015